Goto

Collaborating Authors

 training action localization


Reviews: A flexible model for training action localization with varying levels of supervision

Neural Information Processing Systems

Paper Summary: The paper describes a method for spatio-temporal human action localization in temporally untrimmed videos based on discriminative clustering [3, 47]. The main contribution of this paper is a new action detection approach which is flexible in the sense that it can be trained with various levels and amounts of supervision. For example, the model can be trained with very weak level of supervision, i.e., train the model for action detection only using ground truth video-level action labels; and also it can be trained with full supervision i.e. with dense per frame bounding box and their class labels. Experimental results demonstrate the strengths and weaknesses for a wide range of supervisory signals such as, video level action labels, single temporal point, one GT bounding box, temporal bounds etc. The method is experimentally evaluated on the UCF-101-24 and DALY action detection datasets.